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Abstract 

In  this  Note  we  extend  the  Empirical  Interpolation  Method  (EIM)  to  a  regression  context  which  accommodates 
noisy  (experimental)  data  on  an  underlying  parametric  manifold.  The  EIM  basis  functions  are  computed  Offline 
from  the  noise-free  manifold;  the  EIM  coefficients  for  any  function  on  the  manifold  are  computed  Online  from 
experimental  observations  through  a  least-squares  formulation.  Noise-induced  errors  in  the  EIM  coefficients  and 
in  linear-functional  outputs  are  assessed  through  standard  confidence  intervals  and  without  knowledge  of  the 
parameter  value  or  the  noise  level.  We  also  propose  an  associated  procedure  for  parameter  estimation  from  noisy 
data.  To  cite  this  article:  A.T.  Patera,  E.M.  Ronquist,  C.  R.  Acad.  Sci.  Paris,  Ser.  I  XXX  (2012). 

Resume 

Regression  sur  des  Varietes  Parametriques  :  Estimation  de  Champs  Spatiaux,  Sorties  Fonction- 
nelles,  et  Parametres  a  Partir  de  Donnees  Bruitees  Nous  etendons  la  methode  d’interpolation  empirique, 
EIM  en  abrege  (pour  Empirical  Interpolation  Method),  au  contexte  de  la  regression  en  presence  de  donnees  bruitees 
sur  une  variete  parametrique.  Les  fonctions  de  bases  sont  calculees  hors-ligne  sur  la  base  de  la  variete  sans  bruit ; 
les  coefficients  EIM  d’une  fonction  quelconque  sur  la  variete  sont  calcules  en-ligne  sur  la  base  des  observations 
experimentales  a  travers  une  formulation  moindres  carres.  Les  erreurs  induites  par  les  donnees  bruitees  dans  les 
coefficients  EIM  aussi  bien  que  les  sorties  fonctionclle-lineaire  associees  sont  quantifies  en  intervalles  de  confiance 
et  sans  connaissance  ni  de  la  valeur  du  parametre  ni  de  la  variance  du  bruit.  Nous  proposons  aussi,  dans  le  meme 
esprit,  une  procedure  d’estimation  de  parametre  .  Pour  citer  cet  article  :  A.T.  Patera,  E.M.  R0nquist,  C.  R.  Acad. 
Sci.  Paris,  Ser.  I  XXX  (2012). 
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Nous  considerons  une  variete  parametrique  M.  =  {u(p)  |  p  G  V}  oil  p  G  V  — ►  u(-;p)  G  (7°  (51)  pour 
51  C  Nous  introduisons  ensuite  une  sortie  y(/a)  =  £(u(p))  pour  (.  une  fonctionelle  lineaire  bornee. 

La  methode  d’interpolation  empirique  [1,2],  EIM  en  abrege  (pour  Empirical  Interpolation  Method), 
nous  fournit  un  espace  d’approximation  vectoriel  W  de  dimension  n,  des  fonctions  de  base  associees 
(jj(x),  1  <  j  <  n,  des  noeuds  de  collocation  <  k  <  n,  et  une  matrice  d’interpolation  triangulaire 
inferieure  Bkj  =  qj(xk),l  <  k,j  <  n.  Nos  approximations  du  champ  u(x\  p)  (respectivement,  la  sortie 
y(i tx))  u(x ;  p)  est  done  (1)  (respectivement,  y(p)  =  £{u{-\ p)))  ou  /3(p)  est  solution  du  systeme  d’equations 
(n  x  n)  Bf 3{p)  =  U (p)  pour  C/fc(/x)  =  u{xk\ /x),  1  <  k  <  n. 

Nous  considerons  ensuite  des  donnees  bruitees  de  la  forme  (2)  dans  laquelle  fx*  est  inconnu  et  e  suit 
une  loi  normale,  de  moyenne  zero,  clecorrelee  en  x,  et  homoscedastique  d’ecart-type  a ;  notons  que  m' 
est  le  nombre  d’observations  (experimentales)  a  chaque  point  de  collocation  Xfc,l  <  k  <  n.  On  peut 
deduire  de  ces  donnees  bruitees  une  approximation  de  champ  (4)  oil  /3(/x*)  est  solution  des  equations 
normales  5/3(/x*)  =  BTV  pour  S  =  BTB  et  V  donne  en  (3) ;  la  sortie  associee  est  calculee  ensuite  connne 
y(n)  =  £(u(-]y)).  Nous  definissons  aussi  l’ecart-type  d’echantillonnage,  d(/x*),  voir  (5).  Pour  la  suite  nous 
remarquons  que  S  induit  une  norme  ||  •  ||g  =  ||i?  •  ||  oil  ||  •  ||  represente  la  norrne  euclidienne. 

Nous  fournissons  en  Proposition  2.1  un  intervalle  de  confiance  en  norme  ||  •  ||g  pour  les  coefficients 
d’EIM  j3(/i*)  en  fonction  des  coefficients  de  regression  /3(/x* ) ,  du  nombre  d’observations  experimentales  a 
chaque  point  de  collocation,  m! ,  et  de  la  quantite  p(/x*),  comprenant  l’ecart-type  d’echantillonnage  tr(/x *) 
et  le  quantile  de  la  distribution  F  au  niveau  de  confiance  7.  Ensuite  nous  proposons,  voir  Corollary  2.1, 
un  intervalle  de  confiance  pour  la  sortie  y(/x*)  en  fonction  de  m!  et  p(n*)  mais  en  plus  du  vecteur  de  sortie 
Lj  =  £(qj),  1  <  j  <  n.  (Nous  supposons  dans  cette  Note  que  l’erreur  d’approximation  EIM,  le  deuxieme 
tenne  dans  la  borne  de  Corollary  2.1,  est  negligeable.)  Des  resultats  pour  une  fonction  gaussienne  a  deux 
parametres  perturbee  par  un  bruit  synthetique  confirment  le  comportement  prevu  pour  les  intervalles  de 
confiance  des  coefficients  EIM  aussi  bien  cjue  la  sortie. 

Pour  conclure,  la  Proposition  3.1  fournit  la  borne,  au  niveau  de  confiance  7,  \(3(~p)— <  £(ix*',r)  = 
p(p*)/\frn!  +  r,  pour  toute  valeur  de  parametre  ]2  dans  un  ensemble  de  valeurs  candidates  T  C  V  tel  que 
/x*  se  trouve  dans  un  voisinage  Af(~p\  r)  =  {p!  G  V  \  \\U(p')  —  U(~p) |  <  r}.  Cette  borne  de  la  Proposition 
3.1  peut  servir  comrne  critere  pour  identifier,  dans  l’ensemble  des  valeurs  candidates  T,  un  ensemble  de 
valeurs  coherentes  Tcon  —  valeurs  de  parametres  compatibles  avec  les  observations  experimentales 
donne  en  (6).  Nous  presentons,  voir  Figure  1,  des  resultats  numeriques  pour  notre  example  de  gaussienne 
a  deux  parametres  (avec  le  parametre  p*  =  (0.55, 0.55))  et  un  ensemble  de  valeurs  candidates  T  uniforme 
de  cardinalite  40,000  :  les  cercles  ouverts  indiquents  les  valeurs  de  parametres  en  Ycon. 


1.  Introduction 

Recent  advances  in  model  order  reduction  in  this  paper  we  focus  on  the  Empirical  Interpolation 
Method  (EIM)  [1,2]  -  exploit  an  underlying  parametric  manifold  for  purposes  of  field  or  state  approxi¬ 

mation,  functional  output  approximation,  and  also  parameter  estimation.  In  the  EIM  we  first  construct  a 
low-dimensional  approximation  space  to  represent  the  manifold  and  identify  an  associated  set  of  ad  hoc 
collocation  points;  we  then  approximate  any  particular  function  (field)  on  the  parametric  manifold  by 
interpolation.  In  [3,4]  the  EIM  is  extended  to  an  experimental  context  in  which  the  space  and  collocation 
points  are  generated  by  an  appropriate  model  for  the  manifold  for  example,  solutions  of  a  partial 
differential  equation  but  the  interpolation  data  is  then  provided  by  measurements.  In  this  note  we 
extend  the  “experimental  version”  of  the  EIM  to  a  regression  context  [5]  which  accommodates  noisy  data 
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and  furthermore  provides  an  assessment  of  noise-induced  error  through  standard  confidence  intervals. 

We  assume  that  we  are  given  a  parametric  manifold  of  functions,  M.  =  {n(-,/x)  \  /x  £  2?},  where  for 
any  given  y,  in  the  compact  parameter  domain  V  £  IRP  the  field  u(-;/x)  is  a  function  in  C0(f2)  for 
some  prescribed  d-dimensional  spatial  domain  Q.  We  further  introduce  an  output  y(y)  =  £(u(y)),  where 
£  is  a  bounded  linear  functional.  We  presume  that  xx(x;  /x)  is  piecewise  linear  over  some  fine  simplex 
discretization  of  Q. 

The  EIM  then  provides  an  n-dimensional  approximation  space  W ;  an  associated  set  of  basis  functions 
Qjix)i  1  <  j  <  n,  such  that  W  =  span{gj,  1  <  j  <  n};  and  a  set  of  collocation  points  xk,  1  <  k  <  n.  The 
functions  qj{x)  are  normalized  such  that  maxl€n  |gj(a;)|  =  1.  We  next  construct  an  n  x  n  interpolation 
matrix  B  of  the  form  Bkj  =  qj(xk ),  1  <  k,  j  <  n,  which  is  lower  triangular  with  unity  main  diagonal.  We 
now  fix  n  and  approximate  u(x;  /x)  for  any  given  /x  £  T>\  we  define  the  n  x  1  vector  t/(/x)  with  elements 
Uk(i u)  =  u(xk ;  /x),  1  <  k  <  n;  find  coefhcients  f3j(n),  1  <  j  <  n,  solution  of  the  nxn  system  B(3(/i)  =  U(fi)\ 
construct  the  EIM  interpolant  as 

n 

u(x;fj,)  =  '^2^j(fj,)qj(x);  (1) 

j= i 

and  evaluate  our  output  approximation  from  y(/x)  =  £(u(-;n)).  The  EIM  is  an  interpolation  scheme: 
u(xk',n)  =  u(xk',n),  1  <  k  <  n.  In  this  paper  we  generate  the  space  IT  by  a  Greedy  procedure  which 
provides  the  error  bound  sup^gg;  sup^g^  | u(x;  /x)  —  xx(x;  /x)  |  <  r  where  H  is  a  “training”  set  of  points  in  V. 


2.  Regression  Framework 

In  this  paper  we  shall  presume  that  we  are  provided  with  experimental  data  of  the  form 

uexp(ife;  wfc;i)  =  u(xk;  /x*)  +  e(xk;  ujk;i),  1  <  k  <  n,  1  <i<  m! ,  (2) 

where  e  is  assumed  normal,  zero-mean,  uncorrelated  in  space,  and  homoscedastic  with  standard  deviation 
a.  Note  that  u jk-,i,  l  <  k  <  n,l  <  i  <  m! ,  corresponds  to  m!  realizations  —  repeated  measurements  —  at 
collocation  point  xk  such  that,  in  total,  in  =  m'n  measurements  are  available.  The  conceit  is  that  neither 
[x*  nor  a  is  known  and  that  we  wish  to  determine  u[x\  /x*)  (state  estimation),  y{n*)  (output  estimation), 
and  perhaps  also  /j*  (parameter  estimation). 

We  pursue  a  least-squares  approximation  in  linear  regression  fashion  [5].  We  form  the  nxn  matrix 
S  =  BtB  (superscript  T  refers  to  transpose)  and  the  nxl  vector  V 

m 

Tfc  =  —  V  uexp(xfc;  wfc;i),  1  <k<n.  (3) 

m!  ' 

i= 1 

We  then  find  $ (/i*)  from  the  normal  equations  =  BTV  to  obtain  our  state  approximation 

n 

u(x;n*)  =  Y^P(k'*)qj(x),  (4) 

3= i 

and  subsecjuently  our  output  approximation  y(/x*)  =  £(u(-;y*)).  1  We  also  define  the  sample  standard 
deviation  as 

1.  Note  that  the  normal  equations  take  a  particularly  simple  form  with  respect  to  B  given  our  assumption  of  m'  replicated 
measurements  for  each  point  x^,  1  <  k  <  n. 
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(5) 


^  n  m' 

<7- (At*)  =  a  - V]  V'(wexp(^fc;wfe;i)  -  u{xk\p*))2 

\  fc=l i=l 

We  make  two  notational  remarks:  first,  we  suppress  u>  but  in  fact  $(p*)  is  random;  second,  we  let  the 
context  determine  0{p*)  as  either  a  random  variable  or  as  a  realization  of  a  random  variable. 

We  now  define  ||  •  ||  as  the  Euclidean  norm  and  then  ||u||s  =  y/vTSv  (=  ||Bu||).  We  further  define 
p(p*)  =  d(p*)y/nF(n,m.  —  n,  7)  where  F(ki,k2,ry)  is  the  F-statistic  quantile  [5].  We  may  then  state 

Proposition  2.1  With  confidence  level  7,  ||/3(/x*)  —  /3(/r*)||s  < 

We  now  sketch  the  proof.  We  first  note  that  E[tiexp(a;fc;  =  Uk{p*)  =  (B^)k(p*),  1  <  i  <  m! ,  1  <  k  < 
n,  where  E  denotes  expectation.  Hence  our  confidence  ellipse  [5]  is  given  by  (f3(p*)—f3(p*))m'BTB((3(p*)  — 
${p*))  <  p2(p*).  The  result  then  directly  follows  from  the  definition  of  S  and  ||  •  ||g. 

The  crucial  point  is  that  the  EIM  model  is  unbiased  due  to  first,  the  assumed  form  of  the  experimental 
data,  (2),  as  a  perturbation  on  our  manifold,  and  second,  the  interpolation  property  of  the  EIM  approx¬ 
imation.  Note  that  we  interpret  confidence  levels  in  the  frequentist  sense:  the  bound  of  Proposition  2.1 
will  obtain  in  a  fraction  7  of  (sufficiently  many)  realizations. 

We  now  define  the  nxl  output  vector  L  as  Lj  =  1  <  j  <  n,  such  that  y(p*)  =  LT  j3{p*).  We 

may  then  further  prove 

Corollary  2.1  With  confidence  level  7,  | y(p*)  —  y(p*)\  <  A y(p*)  +  \ y(p*)  —  y(p*)\,  where  Ay(p*)  = 
^v/TtS'-1L. 

v  m' 

We  now  sketch  the  proof.  We  first  note  from  the  definition  of  L  and  the  triangle  inequality  that  \y(p*)  — 
y(p*) |  <  \y(p*)  —  y(p*)\  +  \LT —  j3(p*))\.  We  next  note  that  the  maximum  of  LT a  (respectively,  mini¬ 
mum  of  LT a)  subject  to  the  constraint  aTSa  <  C 2  is  given  by  CV LT S~XL  (respectively,  —  CV LTS,_1L). 
The  result  then  directly  follows  from  Proposition  2.1.  Note  we  may  apply  Corollary  2.1  (jointly)  over  any 
number  of  different  outputs,  including  (with  appropriate  regularity  assumptions)  point  values  of  the  field. 

We  describe  a  paradigm  in  which  Corollary  2.1  might  prove  useful.  (We  consider  the  case,  as  in  the 
examples  below,  in  which  the  error  in  the  EIM  approximation  is  sufficiently  small  such  that  the  second 
term  in  the  bound  of  Corollary  2.1  may  be  neglected.)  We  presume  that  the  EIM  approximation  and 
in  particular  S  is  formed  in  an  Offline  stage.  Then,  in  the  Online  stage,  we  conduct  an  experiment  to 
form  V  in  m  operations,  find  $(p*)  in  n2  operations,  calculate  d(p*)  in  m  operations,  evaluate  y{p*)  in 
n  operations,  and  finally  compute  Av(p*)  in  n2  operations  (for  any  L  defined  in  the  Online  stage).  Thus 
all  computations  over  Q  are  replaced  by  calculations  over  the  very  few  points  xk,  1  <  k  <  n.  Note  that 
/j*  is  not  known,  nor  is  p*  deduced  as  part  of  the  Online  calculations. 

We  now  turn  to  numerical  results.  In  particular,  we  introduce  the  parametrized  function  u( x;  p)  = 
exp(—  ((xi  —  pi)2  +  (x2  —  p2)2)/0.02)  for  x  =  (27,2:2)  €  S2  =  (0,  l)2  and  a  parameter  vector  p  = 
(tH, M2)  G  V  =  [0.4, 0.6]2.  We  construct  an  EIM  approximation  with  n  =  33  terms  which  yields  error 
r  =  10~3  over  a  200  x  200  uniform  grid  S.  Our  output  functional  is  £(v)  =  v(0.4,  0.5)  — 1;(0.6,  0.5).  We  now 
consider  the  particular  choice  p*  =  (0.55,0.55)  with  associated  output  y(p*)  =  —0.4922.  We  first  verify 
Proposition  2.1  for  the  case  7  =  0.95,  m'  =  16,  and  a  =  0.01:  the  inequality  is  satisfied  in  95%  of  10,000 
realizations.  We  next  consider  Corollary  2.1  for  7  =  0.95  and  m!  =  16  and  present  results  for  the  sample 
standard  deviation,  <7 (p*),  output  approximation,  y(p*) ,  and  output  error  bound,  Ay(p*):  (i)  a  =  0.0010 
gives  a(p*)  =  0.0011,  y(p*)  =  -0.4922  and  A y{p*)  =  0.0037;  (ii)  cr  =  0.0100  gives  a(p*)  =  0.0096, 
y{p*)  =  -0.4963  and  A «{p*)  =  0.0340;  (iii)  cr  =  0.0500  gives  a{p*)  =  0.0503,  y{p*)  =  -0.4768  and 
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Av(/i*)  =  0.1824.  In  conclusion,  a(ji*)  «  cr,  and  Ay(/U)  «  3.5 <r(/z*).  Finally,  we  again  consider  the 
case  7  =  0.95  and  cr  =  0.0100  but  now  for  m!  =  64:  we  obtain  <f(/i*)  =  0.0099,  y(n*)  =  —0.4893  and 
Av(y*)  =  0.018;  as  expected,  the  output  error  bound  is  decreased  by  a  factor  of  two  when  m'  is  increased 
fourfold. 


3.  Parameter  Estimation 

We  can  also  apply  the  framework  to  parameter  estimation  [6].  Towards  that  end,  we  introduce  a  can¬ 
didate  set  T  C  D  of  cardinality  K.  We  also  define  Af(jr,r)  =  {//  €  T>\  ||  U(ji')  —  U(n)  ||  <  r},  which 
represents  a  ball  near  / jl ,  and  £(/ii*;r)  =  +  r •  We  may  then  claim 

Proposition  3.1  With  confidence  level  7,  || —  fi{p*)\\ s  <  £(/z*;r)  for  anV  1*  €  T  such  that  y*  € 

A 

We  now  sketch  the  proof.  We  first  note  from  the  triangle  inequality  that  ||  —  /3(/j,*)\\s  <  ||  ft  (fit)  — 

+  \\P(P*)  ~  /3(/f*)||s-  We  next  note  from  the  EIM  system  Bfi(y)  =  U(y)  that  ||/3(/l)  —  /3(/r*)||s  = 
\\B0(jl)  —  ||  =  \\U(jl)  —  U(n*)\\.  The  result  then  follows  from  the  definition  of  r)  and  Propo¬ 

sition  2.1. 

We  emphasize  that  the  accuracy  of  the  EIM  approximation  does  not  affect  the  validity  of  our  claim. 
We  can  also  develop  from  Proposition  3.1  a  test  for  the  hypothesis  that  the  experimental  data  is  indeed 
obtained  from  the  postulated  manifold.  Note  that  the  restriction  to  the  manifold  effectively  regularizes 
the  inverse  problem. 

We  briefly  describe  a  paradigm  associated  with  Proposition  3.1.  In  the  Offline  stage  we  compute  for 
all  Jl  in  T  the  EIM  interpolant  and  associated  (3(jl):  total  storage  Kn.  We  also  choose  r  to  be  greater 
than  a  minimum  distance  \\Uijlfi)  —  U(ji2)\\  between  points  JIi,Ji2  in  Y  to  ensure  adequate  coverage  of  V. 
Then,  in  the  Online  stage,  2  we  find  a  consistent  set  Ycon  —  a  set  of  parameter  values  consistent  with 
the  experimental  data  —  given  by 

Tcon  =  {fi&T  I  \\m-0(n*)\\ S  <  £(Ai*;r)}.  (6) 

The  construction  of  Tcon  requires  Kn  operations. 

We  now  turn  to  numerical  results  for  the  problem  introduced  in  the  previous  section.  We  consider  the 
case  7  =  0.95,  m'  =  16,  and  cr  =  0.02  for  the  particular  candidate  set  T  =  S  with  K  =  40,  000.  In  the 
Offline  stage  we  estimate  r  =  0.0136  based  on  simple  nearest  neighbor  considerations  in  Y.  Then,  in  the 
Online  stage,  we  identify  Tcon  as  shown  in  Figure  1. 

In  future  work  we  will  combine  the  results  described  here  with  model  order  reduction  in  order  both  to 
efficiently  generate  the  EIM  models  (in  the  Offline  stage)  and  also  to  efficiently  assess  the  EIM  contribution 
to  the  output  error  and  parameter  estimation  (in  the  Online  stage). 
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Figure  1.  The  plot  depicts  the  consistent  set  TCOn  in  (6)  for  a  particular  realization.  All  the  consistent  parameter  vectors  are 
within  the  range  //*  zb  A fi* ,  with  fi *  =  (0.55,0.55)  and  A p,*  =  (0.003,0.003);  note  the  restricted  range  of  the  axes  relative 
to  the  full  parameter  domain  T>  =  [0.4,  0.6] 2 . 
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